interactive perception
ArtReg: Visuo-Tactile based Pose Tracking and Manipulation of Unseen Articulated Objects
Murali, Prajval Kumar, Kaboli, Mohsen
Robots operating in real-world environments frequently encounter unknown objects with complex structures and articulated components, such as doors, drawers, cabinets, and tools. The ability to perceive, track, and manipulate these objects without prior knowledge of their geometry or kinematic properties remains a fundamental challenge in robotics. In this work, we present a novel method for visuo-tactile-based tracking of unseen objects (single, multiple, or articulated) during robotic interaction without assuming any prior knowledge regarding object shape or dynamics. Our novel pose tracking approach termed ArtReg (stands for Articulated Registration) integrates visuo-tactile point clouds in an unscented Kalman Filter formulation in the SE(3) Lie Group for point cloud registration. ArtReg is used to detect possible articulated joints in objects using purposeful manipulation maneuvers such as pushing or hold-pulling with a two-robot team. Furthermore, we leverage ArtReg to develop a closed-loop controller for goal-driven manipulation of articulated objects to move the object into the desired pose configuration. We have extensively evaluated our approach on various types of unknown objects through real robot experiments. We also demonstrate the robustness of our method by evaluating objects with varying center of mass, low-light conditions, and with challenging visual backgrounds. Furthermore, we benchmarked our approach on a standard dataset of articulated objects and demonstrated improved performance in terms of pose accuracy compared to state-of-the-art methods. Our experiments indicate that robust and accurate pose tracking leveraging visuo-tactile information enables robots to perceive and interact with unseen complex articulated objects (with revolute or prismatic joints).
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Leisure & Entertainment (0.48)
- Energy (0.35)
RoboRetriever: Single-Camera Robot Object Retrieval via Active and Interactive Perception with Dynamic Scene Graph
Wang, Hecheng, Ren, Jiankun, Yu, Jia, Qi, Lizhe, Sun, Yunquan
Humans effortlessly retrieve objects in cluttered, partially observable environments by combining visual reasoning, active viewpoint adjustment, and physical interaction-with only a single pair of eyes. In contrast, most existing robotic systems rely on carefully positioned fixed or multi-camera setups with complete scene visibility, which limits adaptability and incurs high hardware costs. We present \textbf{RoboRetriever}, a novel framework for real-world object retrieval that operates using only a \textbf{single} wrist-mounted RGB-D camera and free-form natural language instructions. RoboRetriever grounds visual observations to build and update a \textbf{dynamic hierarchical scene graph} that encodes object semantics, geometry, and inter-object relations over time. The supervisor module reasons over this memory and task instruction to infer the target object and coordinate an integrated action module combining \textbf{active perception}, \textbf{interactive perception}, and \textbf{manipulation}. To enable task-aware scene-grounded active perception, we introduce a novel visual prompting scheme that leverages large reasoning vision-language models to determine 6-DoF camera poses aligned with the semantic task goal and geometry scene context. We evaluate RoboRetriever on diverse real-world object retrieval tasks, including scenarios with human intervention, demonstrating strong adaptability and robustness in cluttered scenes with only one RGB-D camera.
- Europe > United Kingdom > England > Greater London > London (0.05)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- (6 more...)
Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking
Wang, Xi, Chen, Tianxing, Yu, Qiaojun, Xu, Tianling, Chen, Zanxin, Fu, Yiting, Lu, Cewu, Mu, Yao, Luo, Ping
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered. Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics. To address this limitation, we present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds. Our method leverages any interactive perception technique as a foundation for interactive perception, inducing slight object movement to generate point cloud frames of the evolving dynamic scene. These point clouds are then segmented using Segment Anything Model 2 (SAM2), after which the moving part of the object is masked for accurate motion online axis estimation, guiding subsequent robotic actions. Our approach significantly enhances the precision and efficiency of manipulation tasks involving articulated objects. Experiments in simulated environments demonstrate that our method outperforms baseline approaches, especially in tasks that demand precise axis-based control. Project Page: https://hytidel.github.io/video-tracking-for-axis-estimation/.
Interactive Perception for Deformable Object Manipulation
Weng, Zehang, Zhou, Peng, Yin, Hang, Kravberg, Alexander, Varava, Anastasiia, Navarro-Alarcon, David, Kragic, Danica
Interactive perception enables robots to manipulate the environment and objects to bring them into states that benefit the perception process. Deformable objects pose challenges to this due to significant manipulation difficulty and occlusion in vision-based perception. In this work, we address such a problem with a setup involving both an active camera and an object manipulator. Our approach is based on a sequential decision-making framework and explicitly considers the motion regularity and structure in coupling the camera and manipulator. We contribute a method for constructing and computing a subspace, called Dynamic Active Vision Space (DAVS), for effectively utilizing the regularity in motion exploration. The effectiveness of the framework and approach are validated in both a simulation and a real dual-arm robot setup. Our results confirm the necessity of an active camera and coordinative motion in interactive perception for deformable objects.
Enhancing Deformable Object Manipulation By Using Interactive Perception and Assistive Tools
In the field of robotic manipulation, the proficiency of deformable object manipulation lags behind human capabilities due to the inherent characteristics of deformable objects. These objects have infinite degrees of freedom, resulting in non-trivial perception and state estimation, and complex dynamics, complicating the prediction of future configurations. Although recent research has focused on deformable object manipulation, most approaches rely on static vision and simple manipulation techniques, limiting the performance level. This paper proposes two solutions to enhance the performance: interactive perception and the use of assistive tools. The first solution posits that optimal perspectives exist during deformable object manipulation, facilitating easier state estimation. By exploring the action-perception regularity, interactive perception facilitates better manipulation and perception. The second solution advocates for the use of assistive tools, a hallmark of human intelligence, to improve manipulation performance. For instance, a folding board can aid in garment folding tasks by reducing object deformation and managing complex dynamics. Hence, this research aims to address the deformable object manipulation problem by incorporating interactive perception and assistive tools to augment manipulation performance.
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)
Bagging by Learning to Singulate Layers Using Interactive Perception
Chen, Lawrence Yunliang, Shi, Baiyu, Lin, Roy, Seita, Daniel, Ahmad, Ayah, Cheng, Richard, Kollar, Thomas, Held, David, Goldberg, Ken
Many fabric handling and 2D deformable material tasks in homes and industry require singulating layers of material such as opening a bag or arranging garments for sewing. In contrast to methods requiring specialized sensing or end effectors, we use only visual observations with ordinary parallel jaw grippers. We propose SLIP: Singulating Layers using Interactive Perception, and apply SLIP to the task of autonomous bagging. We develop SLIP-Bagging, a bagging algorithm that manipulates a plastic or fabric bag from an unstructured state, and uses SLIP to grasp the top layer of the bag to open it for object insertion. In physical experiments, a YuMi robot achieves a success rate of 67% to 81% across bags of a variety of materials, shapes, and sizes, significantly improving in success rate and generality over prior work. Experiments also suggest that SLIP can be applied to tasks such as singulating layers of folded cloth and garments. Supplementary material is available at https://sites.google.com/view/slip-bagging/.
Self-Supervised Learning for Interactive Perception of Surgical Thread for Autonomous Suture Tail-Shortening
Schorp, Vincent, Panitch, Will, Shivakumar, Kaushik, Viswanath, Vainavi, Kerr, Justin, Avigal, Yahav, Fer, Danyal M, Ott, Lionel, Goldberg, Ken
Accurate 3D sensing of suturing thread is a challenging problem in automated surgical suturing because of the high state-space complexity, thinness and deformability of the thread, and possibility of occlusion by the grippers and tissue. In this work we present a method for tracking surgical thread in 3D which is robust to occlusions and complex thread configurations, and apply it to autonomously perform the surgical suture "tail-shortening" task: pulling thread through tissue until a desired "tail" length remains exposed. The method utilizes a learned 2D surgical thread detection network to segment suturing thread in RGB images. It then identifies the thread path in 2D and reconstructs the thread in 3D as a NURBS spline by triangulating the detections from two stereo cameras. Once a 3D thread model is initialized, the method tracks the thread across subsequent frames. Experiments suggest the method achieves a 1.33 pixel average reprojection error on challenging single-frame 3D thread reconstructions, and an 0.84 pixel average reprojection error on two tracking sequences. On the tail-shortening task, it accomplishes a 90% success rate across 20 trials. Supplemental materials are available at https://sites.google.com/berkeley.edu/autolab-surgical-thread/ .
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
SGTM 2.0: Autonomously Untangling Long Cables using Interactive Perception
Shivakumar, Kaushik, Viswanath, Vainavi, Gu, Anrui, Avigal, Yahav, Kerr, Justin, Ichnowski, Jeffrey, Cheng, Richard, Kollar, Thomas, Goldberg, Ken
Cables are commonplace in homes, hospitals, and industrial warehouses and are prone to tangling. This paper extends prior work on autonomously untangling long cables by introducing novel uncertainty quantification metrics and actions that interact with the cable to reduce perception uncertainty. We present Sliding and Grasping for Tangle Manipulation 2.0 (SGTM 2.0), a system that autonomously untangles cables approximately 3 meters in length with a bilateral robot using estimates of uncertainty at each step to inform actions. By interactively reducing uncertainty, Sliding and Grasping for Tangle Manipulation 2.0 (SGTM 2.0) reduces the number of state-resetting moves it must take, significantly speeding up run-time. Experiments suggest that SGTM 2.0 can achieve 83% untangling success on cables with 1 or 2 overhand and figure-8 knots, and 70% termination detection success across these configurations, outperforming SGTM 1.0 by 43% in untangling accuracy and 200% in full rollout speed. Supplementary material, visualizations, and videos can be found at sites.google.com/view/sgtm2.
Interactive Perception at Toyota Research Institute
Dr. Carolyn Matl, Research Scientist at Toyota Research Institute, explains why Interactive Perception and soft tactile sensors are critical for manipulating challenging objects such as liquids, grains, and dough. She also dives into "StRETcH" a Soft to Resistive Elastic Tactile Hand, a variable stiffness soft tactile end-effector, presented by her research group. Carolyn Matl is a research scientist at the Toyota Research Institute, where she works on robotic perception and manipulation with the Mobile Manipulation Team. She received her B.S.E in Electrical Engineering from Princeton University in 2016, and her Ph.D. in Electrical Engineering and Computer Sciences at the University of California, Berkeley in 2021. At Berkeley, she was awarded the NSF Graduate Research Fellowship and was advised by Ruzena Bajcsy. Her dissertation work focused on developing and leveraging non-traditional sensors for robotic manipulation of complicated objects and substances like liquids and doughs. Would you mind introducing yourself? Thank you so much for having me on the podcast. I'm Carolyn Matl and I'm a research scientist at the Toyota research Institute where I work with a really great group of people on the mobile manipulation team on fun and challenging robotic perception and manipulation problems.
SAGCI-System: Towards Sample-Efficient, Generalizable, Compositional, and Incremental Robot Learning
Lv, Jun, Yu, Qiaojun, Shao, Lin, Liu, Wenhai, Xu, Wenqiang, Lu, Cewu
Building general-purpose robots to perform an enormous amount of tasks in a large variety of environments at the human level is notoriously complicated. It requires the robot learning to be sample-efficient, generalizable, compositional, and incremental. In this work, we introduce a systematic learning framework called SAGCI-system towards achieving these above four requirements. Our system first takes the raw point clouds gathered by the camera mounted on the robot's wrist as the inputs and produces initial modeling of the surrounding environment represented as a URDF. Our system adopts a learning-augmented differentiable simulation that loads the URDF. The robot then utilizes the interactive perception to interact with the environments to online verify and modify the URDF. Leveraging the simulation, we propose a new model-based RL algorithm combining object-centric and robot-centric approaches to efficiently produce policies to accomplish manipulation tasks. We apply our system to perform articulated object manipulation, both in the simulation and the real world. Extensive experiments demonstrate the effectiveness of our proposed learning framework. Supplemental materials and videos are available on https://sites.google.com/view/egci.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)